Automatic Diachronic Normalization of Polish Texts
نویسندگان
چکیده
منابع مشابه
Automatic Term Recognition in Polish Texts
Although ATR has been in the research focus for over a decade now, most approaches have been developed for highly positional languages, whereas only a few efforts have been made for Slavic languages which have a richer morphological inflection and a more relaxed word order, e.g., Vintar (2004) (for Slovene) and Nenadic et al. (2003) (for Serbian). In this paper, we report on our experiments in ...
متن کاملUsing Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization
In this paper, we argue that comparable collections of historical written resources can help overcoming typical challenges posed by heritage texts enhancing spelling normalization, POS-tagging and subsequent diachronic linguistic analyses. Thus, we present a comparable corpus of historical German recipes and show how such a comparable text collection together with the application of innovative ...
متن کاملMeasuring Readability of Polish Texts: Baseline Experiments
Measuring readability of a text is the first sensible step to its simplification. In this paper we present an overview of the most common approaches to automatic measuring of readability. Of the described ones, we implemented and evaluated: Gunning FOG index, Flesch-based Pisarek method. We also present two other approaches. The first one is based on measuring distributional lexical similarity ...
متن کاملTerminology extraction from medical texts in Polish
BACKGROUND Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need informat...
متن کاملRule-Based Normalization of Historical Texts
This paper deals with normalization of language data from Early New High German. We describe an unsupervised, rulebased approach which maps historical wordforms to modern wordforms. Rules are specified in the form of context-aware rewrite rules that apply to sequences of characters. They are derived from two aligned versions of the Luther bible and weighted according to their frequency. The eva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Investigationes Linguisticae
سال: 2018
ISSN: 1426-188X
DOI: 10.14746/il.2017.37.2.